DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 29

1
The effect of domain and diacritics in Yorùbá-English neural machine translation
In: 18th Biennial Machine Translation Summit ; https://hal.inria.fr/hal-03350967 ; 18th Biennial Machine Translation Summit, Aug 2021, Orlando, United States (2021)
Abstract: International audience ; Massively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due to lack of standardized evaluation datasets. In this paper, we present MENYO-20k, the first multi-domain parallel corpus with a special focus on clean orthography for Yorùbá-English with standardized train-test splits for benchmarking. We provide several neural MT benchmarks and compare them to the performance of popular pre-trained (massively multilingual) MT models both for the heterogeneous test set and its subdomains. Since these pre-trained models use huge amounts of data with uncertain quality, we also analyze the effect of diacritics, a major characteristic of Yorùbá, in the training data. We investigate how and when this training condition affects the final quality and intelligibility of a translation. Our models outperform massively multilingual models such as Google (+8.7 BLEU) and Facebook M2M (+9.1 BLEU) when translating to Yorùbá, setting a high quality benchmark for future research.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
URL: https://hal.inria.fr/hal-03350967
https://hal.inria.fr/hal-03350967/document
https://hal.inria.fr/hal-03350967/file/adelani_MTSummit2021.pdf
BASE
Hide details
2
Europarl Direct Translationese Dataset ...
BASE
Show details
3
Europarl Direct Translationese Dataset ...
BASE
Show details
4
Europarl Direct Translationese Dataset ...
BASE
Show details
5
A Data Augmentation Approach for Sign-Language-To-Text Translation In-The-Wild ...
Nunnari, Fabrizio; España-Bonet, Cristina; Avramidis, Eleftherios. - : Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021
BASE
Show details
6
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation ...
BASE
Show details
7
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages ...
BASE
Show details
8
Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
BASE
Show details
9
Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
BASE
Show details
10
Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity
Fu, Yingxue; Nederhof, Mark Jan. - : Linkoping University Electronic Press, 2021
BASE
Show details
11
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction ...
BASE
Show details
12
WTC1.1 (WikiTailor corpus v. 1.1) ...
BASE
Show details
13
MT models for multilingual CLuBS engine (en-de-fr-es) ...
BASE
Show details
14
WTC1.0 (WikiTailor corpus v. 1.0) ...
BASE
Show details
15
WTC1.1 (WikiTailor corpus v. 1.1) ...
BASE
Show details
16
MT models for multilingual CLuBS engine (en-de-fr-es) ...
BASE
Show details
17
Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction
In: Computational Linguistics, Vol 46, Iss 2, Pp 249-255 (2020) (2020)
BASE
Show details
18
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies ...
BASE
Show details
19
Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi ...
BASE
Show details
20
Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych ...
BASE
Show details

Page: 1 2

Catalogues
0
0
1
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
28
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern